LISA Data Analysis using MCMC methods 
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The Laser Interferometer Space Antenna (LISA) is expected to simultaneously detect many thou- 
sands of low frequency gravitational wave signals. This presents a data analysis challenge that is 
very different to the one encountered in ground based gravitational wave astronomy. LISA data 
analysis requires the identification of individual signals from a data stream containing an unknown 
number of overlapping signals. Because of the signal overlaps, a global fit to all the signals has to be 
performed in order to avoid biasing the solution. However, performing such a global fit requires the 
exploration of an enormous parameter space with a dimension upwards of 50,000. Markov Chain 
Monte Carlo (MCMC) methods offer a very promising solution to the LISA data analysis problem. 
MCMC algorithms are able to efficiently explore large parameter spaces, simultaneously providing 
parameter estimates, error analysis, and even model selection. Here we present the first application 
of MCMC methods to simulated LISA data and demonstrate the great potential of the MCMC ap- 
proach. Our implementation uses a generalized F-statistic to evaluate the likelihoods, and simulated 
annealing to speed convergence of the Markov chains. As a final step we super-cool the chains to 
extract maximum likelihood estimates, and estimates of the Bayes factors for competing models. 
We find that the MCMC approach is able to correctly identify the number of signals present, ex- 
tract the source parameters, and return error estimates consistent with Fisher information matrix 
predictions. 
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I. INTRODUCTION 

The LISA observatory |l| has incredible science poten- 
tial, but that potential can only be fully realized by em- 
ploying advanced data analysis techniques. LISA will ex- 
plore the low frequency portion of the gravitational wave 
spectrum, which is thought to be home to a vast num- 
ber of sources. Since gravitational wave sources typically 
evolve on timescales that are long compared to the grav- 
itational wave period, individual low frequency sources 
will be "on" for large fractions of the nominal three year 
LISA mission lifetime. Moreover, unlike a traditional 
telescope, LISA can not be pointed at a particular point 
on the sky. The upshot is that the LISA data stream will 
contain the signals from tens of thousands of individual 
sources, and ways must be found to isolate individual 
voices from the crowd. This "Cocktail Party Problem" 
is the central issue in LISA data analysis. 

The types of sources LISA is expected to detect include 
galactic and extra-galactic compact stellar binaries, su- 
per massive black hole binaries, and extreme mass ratio 
inspirals of compact stars into supermassive black holes 
(EMRIs). Other potential sources include intermediate 
mass black hole binaries, cosmic strings, and a cosmic 
gravitational wave background produced by processes in 
the early universe. In the case of compact stellar bi- 
naries 0, H H H @ and EMRIs 00, the number of 
sources is likely to be so large that it will be impossi- 
ble to resolve all the sources individually, so that there 
will be a residual signal that is variously referred to as 
a confusion limited background or confusion noise. It 
is important that this confusion noise be made as small 
as possible so as not to hinder the detection of other 
high value targets. Several estimates of the confusion 
noise level have been made 0, IE IE IS EH EUl i and they 
all suggest that unresolved signals will be the dominant 



source of low frequency noise for LISA. However, these 
estimates are based on assumptions about the efficacy of 
the data analysis algorithms that will be used to identify 
and regress sources from the LISA data stream, and it 
is unclear at present how reasonable these assumptions 
might be. Indeed, the very notion that one can first clean 
the data stream of one type of signal before moving on 
to search for other targets is suspect as the gravitational 
wave signals from different sources are not orthogonal. 
For example, when the signal from a supermassive black 
hole binary sweeps past the signal from a white dwarf 
binary of period T, the two signals will have significant 
overlap for a time interval equal to the geometric mean 
of T and t c , where t c is the time remaining before the 
black holes merge. Thus, by a process dubbed "the white 
dwarf transform," it is possible to decompose the signal 
from a supermassive black hole binary into signals from 
a collection of white dwarf binaries. 

As described in jjnj optimal filtering of the LISA data 
would require the construction of a filter bank that de- 
scribed the signals from every source that contributes 
to the data stream. In principle one could construct a 
vast template bank describing all possible sources and 
look for the best match with the data. In practice the 
enormous size of the search space and the presence of 
unmodeled sources renders this direct approach imprac- 
tical. Possible alternatives to a full template based search 
include iterative refinement of a source-by-source search, 
ergodic exploration of the parameter space using Markov 
Chain Monte Carlo (MCMC) algorithms , Darwinian op- 
timization by genetic algorithms, and global iterative re- 
finement using the Maximum Entropy Method (MEM). 
Each approach has its strengths and weakness, and at 
this stage it is not obvious which approach will prove 
superior. 

Here we apply the popular Markov Chain Monte 
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Carlo [T2I fl3| method to simulated LISA data. This is 
not the first time that MCMC methods have been ap- 
plied to gravitational wave data analysis, but it is first 
outing with realistic simulated LISA data. Our simulated 
data streams contain the signals from multiple galactic 
binaries. Previously, MCMC methods have been used to 
study the extraction of coalescing binary [bH ] and spin- 
ning neutron star 15] signals from terrestrial interfer- 
ometers. More recently, MCMC methods have been ap- 
plied to a simplified toy problem 16] that shares some of 
the features of the LISA cocktail party problem. These 
studies have shown that MCMC methods hold consid- 
erable promise for gravitational wave data analysis, and 
offer many advantages over the standard template grid 
searches. For example, the EMPJ data analysis prob- 
lem 0, @ is often cited as the greatest challenge fac- 
ing LISA science. Neglecting the spin of the smaller 
body yields a 14 dimensional parameter space, which 
would require ~ 10 40 templates to explore in a grid based 
search Q. This huge computational cost arises because 
grid based searches scale geometrically with the param- 
eter space dimension D. In contrast, the computational 
cost of MCMC based searches scale linearly with the D. 
In fields such as finance, MCMC methods are routinely 
applied to problems with D > 1000, making the LISA 
EMRI problem seem trivial in comparison. A Google 
search on "Markov Chain Monte Carlo" returns almost 
250,000 results, and a quick scan of these pages demon- 
strates the wide range of fields where MCMC methods 
are routinely used. We found it amusing that one of 
the Google search results is a link to the PageRank 01 
MCMC algorithm that powers the Google search engine. 

The structure of the paper follows the development 
sequence we took to arrive at a fast and robust MCMC 
algorithm. In fjn]we outline the LISA data analysis prob- 
lem and the particular challenges posed by the galactic 
background. A basic MCMC algorithm is introduced in 
£11111 and applied to a full 7 parameter search for a sin- 
gle galactic binary. A generalized mult i- channel, multi- 
source F-statistic for reducing the search space from 
D = 7N to D — 3N is described in Jv| The per- 
formance of a basic MCMC algorithm that uses the F- 
statistic is studied in Jvland a number of problems with 
this simple approach are identified. A more advanced 
mixed MCMC algorithm that incorporates simulated an- 
nealing is introduced in EIVII and is successfully applied 
to multi-source searches. The issue of model selection 
is addressed in flVIII and approximate Bayes factor are 
calculated by super-cooling the Markov Chains to ex- 
tract maximum likelihood estimates. We conclude with 
a discussion of future refinements and extensions of our 
approach in Willi 



II. THE COCKTAIL PARTY PROBLEM 

Space based detectors such as LISA are able to re- 
turn several interferometer outputs |l8j |. The strains 



registered in the interferometer in response to a grav- 
itational wave pick up modulations due to the motion 
of the detector. The orbital motion introduces ampli- 
tude, frequency, and phase modulation into the observed 
gravitational wave signal. The amplitude modulation re- 
sults from the detector's antenna pattern being swept 
across the sky, the frequency modulation is due to the 
Doppler shift from the relative motion of the detector 
and source, and the phase modulation results from the 
detector's var ying response to the two gravitational wave 
polarizations [n| |2(| • These modulations encode infor- 
mation about the location of the source. The modula- 
tions spread a monochromatic signal over a bandwidth 
A/ ~ (9 + 6(//mHz) sin6>)/ m , where 9 is the co-latitude 
of the source and f m — 1/year is the modulation fre- 
quency. In the low frequency limit, where the wave- 
lengths are large compared to the armlengths of the de- 
tector, the interferometer outputs s a (t) can be combined 
to simulate the response of two independent 90 degree 
interferometers, si(t) and srr(t) , rotated by 45 degrees 
with respect to each other [l9l l2ll ] . This allows LISA to 
measure both polarizations of the gravitational wave si- 
multaneously. A third combination of signals in the low 
frequency limit yields the symmetric Sagnac variable |l8j ] , 
which is insensitive to gravitational waves and can be 
used to monitor the instrument noise. When the wave- 
lengths of the gravitational waves become comparable to 
the size of the detector, which for LISA corresponds to 
frequencies above 10 mHz, the interferometry signals can 
be combined to give three independent time series with 
comparable sensitivities pl| . 

The output of each LISA data stream can be written 

as 

JV 

s a (t) = h a (t, A) + n a (t) = J2 h i (*« %i) + n « (*) ■ (!) 

i=l 

Here h z a (t, \i) describes the response registered in detec- 
tor channel a to a source with parameters A;. The quan- 
tity h a (t,X) denotes the combined response to a collec- 
tion of N sources with total parameter vector A = ^\ A^ 
and n a (t) denotes the instrument noise in channel a. Ex- 
tracting the parameters of each individual source from 
the combined response to all sources defines the LISA 
cocktail party problem. In practice it will be impos- 
sible to resolve all of the millions of signals that con- 
tribute to the LISA data streams. For one, there will 
not be enough bits of information in the entire LISA 
data archive to describe all N sources in the Universe 
with signals that fall within the LISA band. Moreover, 
most sources will produce signals that are well below the 
instrument noise level, and even after optimal filtering 
most of these sources will have signal to noise ratios be- 
low one. A more reasonable goal might be to provide 
estimates for the parameters describing each of the N' 
sources that have integrated signal to noise ratios (SNR) 
above some threshold (such as SNR > 5), where it is now 
understood that the noise includes the instrument noise, 
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residuals from the regression of bright sources, and the 
signals from unresolved sources. 

While the noise will be neither stationary nor Gaus- 
sian, it is not unreasonable to hope that the departures 
from Gaussianity and stationarity will be mild. It is 
well know that matched filtering is the optimal linear 
signal processin g te chnique for signals with stationary 
Gaussian noise |22J, |23(. Matched filtering is used ex- 
tensively in all fields of science, and is a popular data 
analysis technique in ground based gravitational wave 
astronomy M, El Wi El El S M HI S H H3 . 
Switching to the Fourier domain, the signal can be writ- 
ten as s a (f) = h a (f,X') +n a (f), where h a (f) includes 
instrument noise and confusion noise, and the signals are 
described by parameters A'. Using the standard noise 
weighted inner product for the independent data chan- 
nels over a finite observation time T, 



(a\b) 
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~<(f)bM) + ~aM)b*M) 



a f 

a Wiener filter statistic can be defined: 



(h(X)\h(X)) 



(2) 



(3) 



The noise spectral density S n (f) is given in terms of the 
autocorrelation of the noise 



W)n*(f')) = -6 ff ,S n (f). 



(4) 



Here and elsewhere angle brackets () denote an expecta- 
tion value. An estimate for the source parameters A' can 
be found by maximizing p(A). If the noise is Gaussian 
and a signal is present, p(A) will be Gaussian distributed 
with unit variance and mean equal to the integrated sig- 
nal to noise ratio 



SNR=(p(X')) = yJ(h(X>)\h(X')). 



(5) 



The optimal filter for the LISA signal Q is a matched 
template describing all N' resolvable sources. The num- 
ber of parameters di required to describe a source ranges 
from 7 for a slowly evolving circular galactic binary to 
17 for a massive black hole binary. A reasonable esti- 
mate 10] for N' is around 10 4 , so the full parameter 
space has dimension D = J2i^i ~ 10 5 - Since the num- 
ber of templates required to uniformly cover a parameter 
space grows exponentially with D, a grid based search us- 
ing the full optimal filter is out of the question. Clearly 
an alternative approach has to be found. Moreover, the 
number of resolvable sources AT' is not known a priori, so 
some stopping criteria must be found to avoid over-fitting 
the data. 

Existing approaches to the LISA cocktail party prob- 
lem employ iterative schemes. The first such approach 
was dubbed "gCLEAN" due to its similarity with 



the "CLEAN" [42j algorithm that is used for astronom- 
ical image reconstruction. The "gCLEAN" procedure 
identifies and records the brightest source that remains 
in the data stream, then subtracts a small amount of 
this source. The procedure is iterated until a prescribed 
residual is reached, at which time the individual sources 
are reconstructed from the subtraction record. A much 
faster iterative approach dubbed "Slice & Dice" |43| was 
recently proposed that proceeds by identifying and fully 
subtracting the brightest source that remains in the data 
stream. A global least squares re-fit to all the current 
list of sources is then performed, and the new parame- 
ter record is used to produce a regressed data stream for 
the next iteration. Bayes factors are used to provide a 
stopping criteria. 

There is always the danger with iterative approaches 
that the procedure "gets off on the wrong foot," and 
is unable to find its way back to the optimal solution. 
This can happen when two signals have a high degree of 
overlap. A very different approach to the LISA source 
confusion problem is to solve for all sources simultane- 
ously using ergodic sampling techniques. Markov Chain 
Monte Carlo (MCMC) gl O is a method for estimat- 
ing the posterior distribution, p(A|s), that can be used 
with very large parameter spaces. The method is now in 
widespread use in many fields, and is starting to be used 
by astronomers and cosmologists. One of the advantages 
of MCMC is that it combines detection, parameter es- 
timation, and the calculation of confidence intervals in 
one procedure, as everything one can ask about a model 
is contained in p(X\s). Another nice feature of MCMC 
is that there are implementations that allow the number 
of parameters in the model to be variable, with built in 
penalties for using too many parameters in the fit. In 
an MCMC approach, parameter estimates from Wiener 
matched filtering are replaced by the Bayes estimator [3jJ 



Aj,(«)= / X l p(X\s)dX. 



(6) 



which requires knowledge of p(X\s) - the posterior distri- 
bution of A (i.e. the distribution of A conditioned on the 
data s). By Bayes theorem, the posterior distribution is 
related to the prior distribution p(X) and the likelihood 
p(s\X) by 



p(X\s) = 



pWp(s\x) 

J p(X')p(s\X')dX' 



(7) 



Until recently the Bayes estimator was little used in prac- 
tical applications as the integrals appearing in © and (Q 
are often analytically intractable. The traditional solu- 
tion has been to use approximations to the Bayes estima- 
tor, such as the maximum likelihood estimator described 
below, however advances in the Markov Chain Monte 
Carlo technique allow direct numerical estimates to be 
made. 
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When the noise n(t) is a normal process with zero 
mean, the likelihood is given by j3|| 



p(s\X) = Cexp 



!( ( .s-/,(A))|(.k-/,(Aii 



(8) 



where the normalization constant C is independent of 
s. In the large SNR limit the Bayes estimator can be 
approximated by finding the dominant mode of the pos- 
terior distribution, p(X\s), which Finn [36j and Cutler & 
Flannagan|2^| refer to as a maximum likelihood estima- 
tor. Other authors [3?l l38l | define the maximum likeli- 
hood estimator to be the value of A that maximizes the 
likelihood, p(s\X). The former has the advantage of incor- 
porating prior information, but the disadvantage of not 
being invariant under parameter space coordinate trans- 
formations. The latter definition corresponds to the stan- 
dard definition used by most statisticians, and while it 
does not take into account prior information, it is coordi- 
nate invariant. The two definitions give the same result 
for uniform priors, and very similar results in most cases 
(the exception being where the priors have a large gradi- 
ent at maximum likelihood). 

The standard definition of the likelihood yields an es- 
timator that is identical to Wiener matched filtering 40] . 
Absorbing normalization factors by adopting the inverted 
relative likelihood £(A) = p(s\0)/p(s\X), we have 

\ogC(X) = (s\h(X))-±(h(X)\h(X)). (9) 

In the gravitational wave literature the quantity log£(A) 
is usually referred to as the log likelihood, despite the 
inversion and rescaling. Note that 

<log£(A')) = i(p(A')) 2 = isNR 2 . (10) 

The maximum likelihood estimator (MLE), Aml, is found 
by solving the coupled set of equations d\og£/dX l = 0. 
Parameter uncertainties can be estimated from the nega- 
tive Hessian of log£, which yields the Fisher Information 
Matrix 



r«(A) 



d 2 log£(A)\ 



) = (h,i\h d ). (11) 



In the large SNR limit the MLE can be found by writ- 
ing A = A' + A A and Taylor expanding Setting 
dlogL/dAX 1 = yields the lowest order solution 

Aml = A' 1 + AA l = A'* + r«(A')(n|^) . (12) 

The expectation value of the maximum of the log likeli- 
hood is then 



(log£(A ML )) = 



SNR + D 



(13) 



This value exceeds that found in (|10fl by an amount that 
depends on the total number of parameters used in the 



fit, D, reflecting the fact that models with more param- 
eters generally give better fits to the data. Deciding how 
many parameters to allow in the fit is an important issue 
in LISA data analysis as the number of resolvable sources 
is not known a priori. This issue does not usually arise for 
ground based gravitational wave detectors as most high 
frequency gravitational wave sources are transient. The 
relevant question there is whether or not a gravitational 
wave signal is present in a section of the data stream, and 
this question can be dealt with by the Neyman-Pearson 
test or other similar tests that use thresholds on the like- 
lihood £ that are related to the false alarm and false 
dismissal rates. Demanding that £ > 1 - so it is more 
likely that a signal is present than not - and setting a 
detection threshold of p = 5 yields a false alarm prob- 
ability of 0.006 and a detection probability of 0.994 (if 
the noise is stationary and Gaussian). A simple accep- 
tance threshold of p — 5 for each individual signal used 
to fit the LISA data would help restrict the total number 
of parameters in the fit, however there are better cri- 
teria that can be employed. The simplest is related to 
the Neyman-Pearson test and compares the likelihoods of 
models with different numbers of parameters. For nested 
models this ratio has an approximately chi squared dis- 
tribution which allows the significance of adding extra 
parameters to be determined from standard statistical 
tables. A better approach is to compute the Bayes fac- 
tor, 



B 



XY 



Pxjs) 
Py{s) 



(14) 



which gives the relative weight of evidence for models X 
and Y in terms of the ratio of marginal likelihoods 



px(s)= / P (s\X,X)p(X,X)dX. 



(15) 



Here p(s\X, X) is the likelihood distribution for model X 
and p(X, X) is the prior distribution for model X. The 
difficulty with this approach is that the integral in H15|) 
is hard to calculate, though estimates can be made us- 
ing the Laplace approximation or the Bayesian Informa- 
tion Criterion (BIC) |44|. The Laplace approximation is 
based on the method of steepest descents, and for uni- 
form priors yields 



Px(s) 



> p( S \X ML ,X) 



(16) 



where j>(s|Aml,A) is the maximum likelihood for the 
model, Vx is the volume of the model's parameter space, 
and AVx is the volume of the uncertainty ellipsoid (es- 
timated using the Fisher matrix) . Models with more pa- 
rameters generally provide a better fit to the data and 
a higher maximum likelihood, but they get penalized by 
the AVx /Vx term which acts as a built in Occam's razor. 
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III. MARKOV CHAIN MONTE CARLO 

We begin by implementing a basic MCMC search 
for galactic binaries that searches over the full D = 
7N dimensional parameter space using the Metropolis- 
Hastings ^3 algorithm. The idea is to generate a set of 
samples, {x}, that correspond to draws from the poste- 
rior distribution, p(A|s). To do this we start at a ran- 
domly chosen point x and generate a Markov chain ac- 
cording to the following algorithm: Using a proposal dis- 
tribution q(-\x), draw a new point y. Evaluate the Hast- 
ings ratio 

H = p(y)p(s\y)g(x\y) 

p{x)p{s\x)q{y\x) ' 

Accept the candidate point y with probability a = 
min(l,77), otherwise remain at the current state x 
(Metropolis rejection 0]). Remarkably, this sampling 
scheme produces a Markov chain with a stationary dis- 
tribution equal to the posterior distribution of inter- 
est, p(X\s), regardless of the choice of proposal distri- 
bution |45|. A concise introduction to MCMC methods 
can be found in the review paper by Andrieu et al |47l |. 

On the other hand, a poor choice of the proposal dis- 
tribution will result in the algorithm taking a very long 
time to converge to the stationary distribution (known 
as the burn-in time). Elements of the Markov chain pro- 
duced during the burn-in phase have to be discarded as 
they do not represent the stationary distribution. When 
dealing with large parameter spaces the burn-in time can 
be very long if poor techniques are used. For example, 
the Metropolis sampler, which uses symmetric proposal 
distributions, explores the parameter space with an ef- 
ficiency of at most ~ 0.3/ D, making it a poor choice 
for high dimension searches. Regardless of the sampling 
scheme, the mixing of the Markov chain can be inhibited 
by the presence of strongly correlated parameters. Cor- 
related parameters can be dealt with by making a local 
coordinate transformation at a; to a new set of coordi- 
nates that diagonalises the Fisher matrix, (x) . 

We tried a number of proposal distributions and up- 
date schemes to search for a single galactic binary. The 
results were very disappointing. Bold proposals that at- 
tempted large jumps had a very poor acceptance rate, 
while timid proposals that attempted small jumps had a 
good acceptance rate, but they explored the parameter 
space very slowly, and got stuck at local modes of the pos- 
terior. Lorentzian proposal distributions fared the best 
as their heavy tails and concentrated peaks lead to a mix- 
ture of bold and timid jumps, but the burn in times were 
still very long and the subsequent mixing of the chain was 
torpid. The MCMC literature is full of similar examples 
of slow exploration of large parameter spaces, and a host 
of schemes have been suggested to speed up the burn-in. 
Many of the accelerated algorithms use adaptation to 
tune the proposal distribution. This violates the Markov 
nature of the chain as the updates depend on the his- 
tory of the chain. More complicated adaptive algorithms 



have been invented that restore the Markov property by 
using additional Metropolis r eject ion steps. The popu- 
lar Delayed Rejection Method [43 and Reversible Jump 
Method 0] are examples of adaptive MCMC algorithms. 
A simpler approach is to use a non-Markov scheme dur- 
ing burn-in, such as adaptation or simulated annealing, 
then transition to a Markov scheme after burn-in. Since 
the burn-in portion of the chain is discarded, it does not 
matter if the MCMC rules are broken (the burn-in phase 
is more like Las Vegas than Monte Carlo). 

Before resorting to complex acceleration schemes we 
tried a much simpler approach that proved to be very 
successful. When using the Metropolis-Hastings algo- 
rithm there is no reason to restrict the updates to a single 
proposal distribution. For example, every update could 
use a different proposal distribution so long as the choice 
of distribution is not based on the history of the chain. 
The proposal distributions to be used at each update can 
be chosen at random, or they can be applied in a fixed 
sequence. Our experience with single proposal distribu- 
tions suggested that a scheme that combined a very bold 
proposal with a very timid proposal would lead to fast 
burn-in and efficient mixing. For the bold proposal we 
chose a uniform distribution for each of the source param- 
eters A — > (A, /, 9, 4>, ip, l, ipo). Here A is the amplitude, / 
is the gravitational wave frequency, 9 and <f> are the eclip- 
tic co-latitude and longitude, ip is the polarization angle, 
i is the inclination of the orbital plane, and tpo is the or- 
bital phase at some fiducial time. The amplitudes were 
restricted to the range A 6 [10~ 23 , 10~ 21 ] and the fre- 
quencies were restricted to lie within the range of the data 
snippet / € [0.999995, 1.003164] mHz (the data snippet 
contained 100 frequency bins of width A/ = 1/year). A 
better choice would have been to use a cosine distribu- 
tion for the co-latitude 9 and inclination t, but the choice 
is not particularly important. When multiple sources 
were present each source was updated separately dur- 
ing the bold proposal stage. For the timid proposal we 
used a normal distribution for each eigendirection of the 
Fisher matrix, Tij(x). The standard deviation for 
each eigendirection k was set equal to a- k = Xj^Ja^D, 
where at is the corresponding eigenvalue of I\j (x) , and 
D = 7N is the search dimension. The factor of 1/VD 
ensures a healthy acceptance rate as the typical total 
jump is then ~ la. All N sources were updated simul- 
taneously during the timid proposal stage. Note that 
the timid proposal distributions are not symmetric since 
Tij (x) ^ Tij (y) . One set of bold proposals (one for each 
source) was followed by ten timid proposals in a repeating 
cycle. The ratio of the number of bold to timid propos- 
als impacted the burn-in times and the final mixing rate, 
but ratios anywhere from 1:1 to 1:100 worked well. We 
used uniform priors, p(x) = const., for all the parameters, 
though once again a cosine distribution would have been 
better for 9 and I. Two independent LISA data channels 
were simulated directly in the frequency domain using the 
method described in Ref. with the sources chosen at 
random using the same uniform distributions employed 
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TABLE I: 7 parameter MCMC search for a single galactic 
binary 





A (1(T 22 ) 
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0.14 


2.2e-06 
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0.051 
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0.050 


0.22 


CMCMC 


0.14 


2.4e-06 


0.089 


0.055 


0.058 


0.052 


0.23 



by the bold proposal. The data covers 1 year of observa- 
tions, and the data snippet contains 100 frequency bins 
(of width 1 /year) . The instrument noise was assumed to 
be stationary and Gaussian, with position noise spectral 
density S% os = 4 x 10~ 22 m 2 Hz _1 and acceleration noise 
spectral density 5^ cccl = 9 x 10~ 30 m 2 s _4 Hz _1 . 

Table [I] summarizes the results of one MCMC run us- 
ing a model with one source to search for a single source 
in the data snippet. Burn- in lasted ~ 2000 iterations, 
and post burn-in the chain was run for 10 6 iterations 
with a proposal acceptance rate of 77% (the full run 
took 20 minutes on a Mac G5 2 GHz processor). The 
chain was used to calculate means and variances for all 
the parameters. The parameter uncertainty estimates 
extracted from the MCMC output are compared to the 
Fisher matrix estimates evaluated at the mean values of 
the parameters. The source had true SNR = 12.9, and 
MCMC recovered SNR = 10.7. Histograms of the poste- 
rior parameter distributions are shown in Figure^ where 
they are compared to the Gaussian approximation to the 
posterior given by the Fisher matrix. The agreement is 
impressive, especially considering that the bandwidth of 
the source is roughly 10 frequency bins, so there are very 
few noise samples to work with. Similar results were 
found for other MCMC runs on the same source, and for 
MCMC runs with other sources. Typical burn-in times 
were of order 3000 iterations, and the proposal accep- 
tance rate was around 75%. 

The algorithm was run successfully on two and three 
source searches (the model dimension was chosen to 
match the number of sources in each instance), but on 
occasions the chain would get stuck at a local mode of 
the posterior for a large number of iterations. Before 
attempting to cure this problem with a more refined 
MCMC algorithm, we decided to eliminate the extrinsic 
parameters A,i,ip,eo from the search by using a multi- 
filter generalized F-statistic. This reduces the search di- 
mension to D = 3N, with the added benefit that the 
projection onto the (/, 8, <f>) sub-space yields a softer tar- 
get for the MCMC search. 



IV. GENERALIZED F-STATISTIC 

The F-statistic was originally introduced (3^ in the 
context of ground based searches for gravitational wave 
signals from rotating Neutron stars. The F-statistic has 
since been used to search for monochromatic galactic bi- 
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FIG. 1: Histograms showing the posterior distribution (grey) 
of the parameters. Also shown (black line) is the Gaussian ap- 
proximation to the posterior distribution based on the Fisher 
information matrix. The mean values have been subtracted, 
and the parameters have been scaled by the square root of 
the variances calculated from the MCMC chains. 



naries using simulated LISA data |43j, |51j . By using mul- 
tiple linear filters, the F-statistic is able to automatically 
extremize the log likelihood over extrinsic parameters, 
thus reducing the dimension of the search space (the pa- 
rameter space dimension remains the same). 

In the low-frequency limit the LISA response to a grav- 
itational wave with polarization content ft>+(£), h x (t) can 
be written as 



h(t) = h+(t)F+(t) + h x (t)F x (t), (18) 



where 



F+(t) = ~ (cos 2^> £>+(*) -sin 2^-D x (i)) 

F x (t) = i (sin 2?/> £>+(£) + cos 2^1? x (f)) (19) 

The detector pattern functions D + (t) and D x (t) are 
given in equations (36) and (37) of Ref.|5C|. To leading 
post-Newtonian order a slowly evolving, circular binary 
has polarization components 

h+(t) = A(l + cos 2 l) cos($(i) + tpo) 

h x (t) = ~2Acosl sin($(t) +tp ). (20) 

The gravitational wave phase 

*(t;/,M) = 2n ft + 2nfA\J sinO cos(2TTf m t-^), (21) 
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couples the sky location and the frequency through the 
term that depends on the radius of LISA's orbit, 1 AU, 
and the orbital modulation frequency, f m = 1/ year. The 
gravitational wave amplitude, A, is effectively constant 
for the low frequency galactic sources we are considering. 
Using these expressions Ijl8jl can be written as 



where 



h(t) = <*<(4 i>> h Vo)A l (t- f, 6, 



(22) 



where the time-independent amplitudes are given by 

a\ = — ((1 + cos 2 l) cos^o cos 2?/; — 2 cos i sin sin 2?/;) , 

a-i = — — (2 cos l sin tpo cos 2ip + (1 + cos 2 t) cos ipo sin 2ift) , 

a 3 = — — (2 cos i cos tpo sin 2-0 + (1 + cos 2 i) sintpo cos2?/>) , 

0,4 = — ((1 + cos 2 l) sin tpo sin 2ip — 2 cos i cos tpo cos 2-0) , 

(23) 

and the time-dependent functions A z (i) are given by 





= D+{t;6, 


0)cos$(t;/,M) 




A\t) 


= D x (t;0, 


0) cos $(0/, 6,0) 




A\t) 


= D+(t;9, 


0)sin$(t;/,0,0) 




A\t) 


= D x (t;9, 


0) sin*(f;/,M). 


(24) 



Defining the four constants TV 1 = (sl^l 1 ) and using 122|) 
yields a solution for the amplitudes af. 



(M-%N> , 



(25) 



where M 4 - 7 = The output of the four linear 

filters, N*, and the 4x4 matrix M y can be calculated 
using the same fast Fourier space techniques used 
to generate the full waveforms. Substituting (1221) and 
(|25|l into expression 10 for the log likelihood yields the 
F-statistic 



F = \ogC = ^{M- l ) ij N i Ni 



(26) 



The F-statistic automatically maximizes the log likeli- 
hood over the extrinsic parameters A, t, tp and <po, and 
reduces the search to the sub-space spanned by /, 9 and 
<p. The extrinsic parameters can be recovered from the 
a,'s via 



A A 



A 



Al 



1 ^ A + a4-A x a! 

— arctan 
2 



(A x a 2 + A + a 3 ) / 



arccos 



ipo = arctan 



c(A + q 4 - A x ai) 
-c(A x d2 + ^+03), 



(27) 



\J [ax + a 4 ) 2 + (a 2 - a 3 ) 2 
+ V ( a i - a 4) 2 + ( a 2 + a 3 ) 2 



^4+ 

A x = V ( a i + a 4) 2 + (02 - a 3 ) 



- V( a l a ±) 2 + ( a 2 + «3) 2 

sign(sin(20>)) . 



(28) 



The preceding description of the F-statistic automat- 
ically incorporates the two independent LISA channels 
through the use of the dual-channel noise weighted inner 
product (a\b). The basic F-statistic can easily be gener- 
alized to handle N sources. Writing i = 4K + 1, where K 
labels the source and I = 1 — > 4 labels the four filters for 
each source, the F-statistic (|26|) keeps the same form as 
before, but now there are 47V linear filters N l , and M 4J 
is a 4A x 4A dimensional matrix. For slowly evolving 
galactic binaries, which dominate the confusion problem, 
the limited bandwidth of each individual signal means 
that the M u is band diagonal, and thus easily inverted 
despite its large size. 

Since the search is now over the projected sub-space 
{fj,9ji4>j} °f the full parameter space, the full Fisher 
matrix, Tij(x), is replaced by the projected Fisher ma- 
trix, jij(x). The projection of the fc th parameter is given 
by 



r' ; 



1 ik L jk 
1 kk 



(29) 



where n denotes the dimension of the projected ma- 
trix. Repeated application of the above projection yields 
jij = Tfj . Inverting jij yields the same uncertainty 
estimates for the intrinsic parameters as one gets from 
the full Fisher matrix, but the covariances are much 
larger. The large covariances make it imperative that the 
proposal distributions use the eigenvalues and eigenvec- 
tors of jij , as using the parameter directions themselves 
would lead to a slowly mixing chain. 



V. F-STATISTIC MCMC 

We implemented an F-statistic based MCMC algo- 
rithm using the approach described in ijllll but with 
the full likelihood replaced by the F-statistic and the full 
Fisher matrix replaced by the projected Fisher matrix. 
Applying the F-MCMC search to the same data set as 
before yields the results summarized in Figure [21 and Ta- 
ble [H] The recovered source parameters and signal-to- 
noise ratio (SNR = 10.4) are very similar to those found 
using the full 7-parameter search, but the F-MCMC esti- 
mates for the errors in the extrinsic parameters are very 
different. This is because the chain does not explore ex- 
trinsic parameters, but rather relies upon the F-statistic 
to find the extrinsic parameters that give the largest log 
likelihood based on the current values for the intrinsic 



8 



TABLE II: F-MCMC search for a single galactic binary 





A (HT 22 ) 


/ (mHz) 


e 


<$> 


4> 


l 


<po 


ATrue 


1.73 


1.0005853 


0.98 


4.46 


2.55 


1.47 


0.12 


Af-MCMC 


1.38 


1.0005835 


1.09 


4.42 


2.56 


1.51 


0.17 


OFisher 


0.14 


2.2e-06 


0.089 


0.052 


0.055 


0.051 


0.22 


ffMCMC 


0.02 


2.5e-06 


0.093 


0.056 


0.027 


0.016 


0.21 



parameters. The effect is very pronounced in the his- 
tograms shown in Figure [5] Similar results were found 
for other F-MCMC runs on the same source, and for F- 
MCMC runs with other sources. Typical burn-in times 
were of order 1000 iterations, and the proposal accep- 
tance rate was around 60%. As expected, the F-MCMC 
algorithm gave shorter burn-in times than the full pa- 
rameter MCMC, and a comparable mixing rate. 
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FIG. 2: Histograms showing the posterior distribution (grey) 
of the parameters. Also shown (black line) is the Gaussian ap- 
proximation to the posterior distribution based on the Fisher 
information matrix. The mean values have been subtracted, 
and the parameters have been scaled by the square root of 
the variances calculated from the F-MCMC chains. 

It is interesting to compare the computational cost of 
the F-MCMC search to a traditional F-Statistic based 
search on a uniformly spaced template grid. To cover 
the parameter space of one source (which for the cur- 
rent example extends over the full sky and 100 frequency 
bins) with a minimal match |3J| of MM = 0.9 requires 
39,000 templates [H- A typical F-MCMC run uses less 
than 1000 templates to cover the same search space. The 
comparison becomes even more lopsided if we consider si- 



multaneous searches for multiple sources. A grid based 
simultaneous search for two sources using the F-statistic 
would take (39,000) 2 ~ 1.5 x 10 9 templates, while the 
basic F-MCMC algorithm typically converges on the two 
sources in just 2000 steps. As the number of sources 
in the model increases the computation cost of the grid 
based search grows geometrically while the cost of the 
F-MCMC search grows linearly. It is hard to imagine 
a scenario (other than quantum computers) where non- 
iterative grid based searches could play a role in LISA 
data analysis. 




FIG. 3: Trace plots of the sky location parameters for two 
F-MCMC runs on the same data set. Both chains initially 
locked onto a secondary mode of the posterior, but one of the 
chains (light colored line) transitioned to the correct mode 
after 13,000 iterations. 



While testing the F-MCMC algorithm on differ- 
ent sources we came across instances where the 
chain became stuck at secondary modes of the pos- 
terior. A good example occurred for a source with 
parameters (A, /, 6, </>, ip, l, (p )=(lAe-22, 1.0020802 mHz, 
0.399,5.71,1.3,0.96,1.0) and SNR = 16.09. Most 
MCMC runs returned good fits to the source pa- 
rameters, with an average log likelihood of ln£ = 
132, mean intrinsic parameter values (/, 9, <j)) = 
(1.0020809 mHz, 0.391, 5.75) and SNR = 16.26. How- 
ever, some runs locked into a secondary mode with av- 
erage log likelihood ln£ = 100, mean intrinsic param- 
eter values (f,9,(j)) = (1.0020858 mHz, 2.876, 5.20) and 
SNR = 14.15. It could sometimes take hundreds of thou- 
sands of iterations for the chain to discover the dominant 
mode. Figure 01 shows plots of the (inverted) likelihood 
C and the log likelihood ln£ as a function of sky loca- 
tion for fixed / = 1.0020802 mHz. The log likelihood 
plot reveals the problematic secondary mode near the 
south pole, while the likelihood plot shows just how small 
a target the dominant mode presents to the F-MCMC 
search. Similar problems with secondary modes were en- 
countered in the f — 4> plane, where the chain would get 
stuck a full bin away from the correct frequency. These 
problems with the basic F-MCMC algorithm motivated 
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the embellishments described in the following section. 
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FIG. 4: Inverted likelihood and log likelihood as a function of 
sky location at fixed frequency. 



VI. MULTIPLE PROPOSALS AND HEATING 

The LISA data analysis problem belongs to a partic- 
ularly challenging class of MCMC problems known as 
"mixture models." As the name suggests, a mixture 
model contains a number of components, some or all of 
which may be of the same type. In our present study 
all the components are slowly evolving, circular binaries, 
and each component is described by the same set of seven 
parameters. There is nothing to stop two components in 
the search model from latching on to the same source, nor 
is there anything to stop one component in the search 
model from latching on to a blend of two overlapping 
sources. In the former instance the likelihood is little 
improved by using two components to model one source, 
so over time one of the components will tend to wander 
off in search of another source. In the latter instance it 
may prove impossible for any data analysis method to 
de-blend the sources (the marginal likelihood for the sin- 
gle component fit to the blended sources may exceed the 
marginal likelihood of the "correct" solution). 

The difficulties we encountered with the single source 
searches getting stuck at secondary modes of the pos- 
terior are exacerbated in the multi-source case. Source 
overlaps can create additional secondary modes that are 



not present in the non-overlapping case. We employed 
two techniques to speed burn-in and to reduce the chance 
of the chain getting stuck at a secondary mode: simu- 
lated annealing and multiple proposal distributions. Sim- 
ulated annealing works by softening the likelihood func- 
tion, making it easier for the chain to move between 
modes. The likelihood © can be thought of as a parti- 
tion function Z = C exp(—f3E) with the "energy" of the 
system given by E — [s — h\s — h) and the "inverse tem- 
perature" equal to (3 = 1/2. Our goal is to find the tem- 
plate h that minimizes the energy of the system. Heat- 
ing up the system by setting (3 < 1/2 allows the Markov 
Chain to rapidly explore the likelihood surface. We used 
a standard power law cooling schedule: 



/3 = 



t/T c 



0<t<T c 
t > T, 



(30) 



where t is the number of steps in the chain, T c is the 
cooling time and Po is the initial inverse temperature. It 
took some trial and error to find good values of T c and (3q. 
If some of the sources have very high SNR it is a good idea 
to start at a high temperature (3q ~ 1/50, but in most 
cases we found (3q — 1/10 to be sufficient. The optimal 
choice for the cooling time depends on the number of 
sources and the initial temperature. We found that it 
was necessary to increase T c roughly linearly with the the 
number of sources and the initial temperature. Setting 
T c = 10 5 for a model with N — 10 sources and an initial 
temperature of /3o = 1/10 gave fairly reliable results, but 
it is always a good idea to allow longer cooling times if 
the computational resources are available. The portion 
of the chain generated during the annealing phase has to 
be discarded as the cooling introduces an arrow of time 
which necessarily violates the reversibility requirement of 
a Markov Chain. 

After cooling to (3 = 1/2 the chain can explore the like- 
lihood surface for the purpose of extracting parameter es- 
timates and error estimates. Finally, we can extract max- 
imum likelihood estimates by "super cooling" the chain 
to some very low temperature (we used (3 ~ 10 4 ). 

The second ingredient in our advanced F-MCMC al- 
gorithm is a large variety of proposal distributions. 
We used the following types of proposal distribution: 
Uniform(-, x, i) - a uniform draw on all the parameters 
that describe source i, using the full parameter ranges, 
with all other sources held fixed; Normal(-,af) - a multi- 
variate normal distribution with variance-covariance ma- 
trix given by 3N x 7(2;); Sky(-, x*,i) - a uniform draw on 
the sky location for source i; cr-Uniform(-, x, i) - a uniform 
draw on all the parameters that describe source i, using 
a parameter range given by some multiple of the stan- 
dard deviations given by j(x). The Uniform(-, x, i) and 
Normal(-, x) proposal distributions are the same as those 
used in the basic F-MCMC algorithm. The Sky(-,f, i) 
proposal proved to be very useful at getting the chain 
away from secondary modes like the one seen in Figure^ 
while the <7-Uniform(-, x, i) proposal helped to move the 
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chain from secondary modes in the / — <p or / — 6 planes. 
During the initial annealing phase the various proposal 
distributions were used in a cycle with one set of the bold 
distributions (Uniform, Sky and er— Uniform) for every 
10 draws from the timid multivariate normal distribu- 
tion. During the main MCMC run at j3 = 1/2 the ratio 
of timid to bold proposals was increased by a factor of 
10, and in the final super-cooling phase only the timid 
multivariate normal distribution was used. 

The current algorithm is intended to give a proof of 
principle, and is certainly far from optimal. Our choice 
of proposal mixtures was based on a few hundred runs 
using several different mixtures. There is little doubt 
that a better algorithm could be constructed that uses a 
larger variety of proposal distributions in a more optimal 
mixture. 

The improved F-MCMC algorithm was tested on a 
variety of simulated data sets that included up to 10 
sources in a 100 bin snippet (once again we are us- 
ing one year of observations). The algorithm performed 
very well, and was able to accurately recover all sources 
with SNR > 5 so long as the degree of source cor- 
relation was not too large. Generally the algorithm 
could de-blend s ources that had correlation coefficients 
C12 = (/ii|/i 2 )/\/Oi|^i)(fr2M below 0.3. A full inves- 
tigation of the de-blending of highly correlated sources is 
deferred to a subsequent study. For now we present one 
representative example from the 10 source searches. 
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FIG. 5: Simulated LISA data with 10 galactic binaries. The 
solid lines show the total signal plus noise, while the dashed 
lines show the instrument noise contribution. 

A set of 10 galactic sources was randomly selected from 
the frequency range / € [0.999995, 1.003164] mHz and 
their signals were processed through a model of the LISA 
instrument response. The root spectral densities in the 
two independent LISA data channels are shown in Fig- 
ure and the source parameters are listed in Table ITTT1 
Note that one of the sources had a SNR below 5. The 
data was then search using our improved F-MCMC al- 
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FIG. 6: Trace plots of the frequencies for two of the model 
sources. During the annealing phase < 10 5 ) the chain 
explores large regions of parameter space. The inset shows a 
zoomed in view of the chain during the MCMC run at j3 = 1/2 
and the final super cooling which starts at # = 1.2 x 10 . 



TABLE III: F-MCMC search for 10 galactic binaries using a 
model with 10 sources. The frequencies are quoted relative to 
1 mHz as / = 1 mHz + 8f with Sf in (jHz. 





SNR 


A (HT 22 ) 


Sf 


e 


<t> 


V> 


L 




True 


8.1 


0.56 


0.623 


1.18 


4.15 


2.24 


2.31 


1.45 


MCMC ML 


8.6 


0.76 


0.619 


1.13 


4.16 


1.86 


2.07 


1.01 


True 


9.0 


0.47 


0.725 


0.80 


0.69 


0.18 


0.21 


2.90 


MCMC ML 


11.3 


0.97 


0.725 


0.67 


0.70 


0.82 


0.99 


1.41 


True 


5.1 


0.46 


0.907 


2.35 


0.86 


0.01 


2.09 


2.15 


MCMC ML 


6.0 


0.67 


0.910 


2.07 


0.61 


3.13 


1.88 


2.28 


True 


8.3 


1.05 


1.126 


1.48 


2.91 


0.46 


1.42 


1.67 


MCMC ML 


6.9 


0.75 


1.114 


1.24 


3.01 


0.40 


1.26 


2.88 


True 


8.2 


0.54 


1.732 


1.45 


0.82 


1.58 


0.79 


2.05 


MCMC ML 


7.7 


0.77 


1.730 


1.99 


0.69 


1.27 


1.18 


2.73 


True 


14.7 


1.16 


1.969 


1.92 


0.01 


1.04 


2.17 


5.70 


MCMC ML 


12.8 


1.20 


1.964 


1.97 


6.16 


0.97 


2.00 


6.15 


True 


4.9 


0.41 


2.057 


2.19 


1.12 


1.04 


2.13 


3.95 


MCMC ML 


5.2 


0.66 


1.275 


0.57 


2.81 


0.57 


1.82 


3.93 


True 


8.8 


0.85 


2.186 


2.21 


4.65 


3.13 


2.01 


4.52 


MCMC ML 


10.0 


1.01 


2.182 


2.43 


5.06 


0.26 


2.00 


5.54 


True 


7.6 


0.58 


2.530 


2.57 


0.01 


0.06 


0.86 


0.50 


MCMC ML 


6.7 


0.98 


2.582 


2.55 


6.03 


2.71 


1.52 


5.58 


True 


11.7 


0.69 


2.632 


1.17 


3.14 


0.45 


2.53 


0.69 


MCMC ML 


13.5 


1.39 


2.627 


1.55 


3.07 


3.08 


1.94 


6.07 



gorithm using a model with 10 sources (70 parameters). 
The annealing time was set at 10 5 steps, and this was fol- 
lowed by a short MCMC run of 2 x 10 4 steps and a super 
cooling phase that lasted 2 x 10 4 steps. The main MCMC 
run was kept short as we were mostly interested in ex- 
tracting maximum likelihood estimates. Figure [fj] shows 
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a trace plot of the chain that focuses on the frequencies 
of two of the model sources. During the early hot phase 
the chain moves all over parameter space, but as the sys- 
tem cools to (3 = 1/2 the chain settles down and locks 
onto the sources. During the final super cooling phase 
the movement of the chain is exponentially damped as 
the model is trapped at a mode of shrinking width and 
increasing height. 

The list of recovered sources can be found in Table Infl 
The low SNR source (SNR = 4.9) was not recovered, but 
because the model was asked to find 10 sources it instead 
dug up a spurious source with SNR = 5.2. With two ex- 
ceptions, the intrinsic parameters for the other 9 sources 
were recovered to within 3cr of the true parameters (us- 
ing the Fisher matrix estimate of the parameter recovery 
errors). The two misses were the frequency of the source 
at / = 1.00253 mHz (out by 19c) and the co-latitude 
of the the source at / = 1.002632 mHz (out by 6cr). It 
is no co-incidence that these misses occurred for the two 
most highly correlated sources (C^io = —0.23). The full 
source cross-correlation matrix is listed in (I31|) . 



(hi\hj 



V( h *\ h i)( h j\ h j) 

( 1 0.08 0.01 
0.08 1 0.02 0.01 



0.02 1 
0.01 0.01 -0.06 














V o 



-0.06 

1 

1 















1 

0.03 

0.01 0.03 -0.05 








0.01 



0.03 0.03 
1 -0.05 
1 








II 







II 
















1 

-0.23 












-0.23 
1 



(31) 



The MCMC derived maximum likelihood estimates for 
the the source parameters can be used to regress the 
sources from the data streams. Figure compares the 
residual signal to the instrument noise. The total resid- 
ual power is below the instrument noise level as some of 
the noise has been incorporated into the recovered sig- 
nals. 



VII. MODEL SELECTION 

In the preceding examples we used models that had 
the same number of components as there were sources in 
the data snippet. This luxury will not be available with 
the real LISA data. A realistic data analysis procedure 
will have to explore model space as well as parameter 
space. It is possible to generalize the MCMC approach 
to simultaneously explore both spaces by incorporating 
trans-dimensional moves in the proposal distributions. In 
other words, proposals that change the number of sources 
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FIG. 7: The LISA data channels with the sources regressed 
using the maximum likelihood parameter estimates from the 
F-MCMC search. The solid lines show the residuals, while 
the dashed lines show the instrument noise contribution. 



being used in the fit. One popular method for doing this 
is Reverse Jump MCMC |49j, but there are other sim- 
pler methods that can be used. When trans-dimensional 
moves are built into the MCMC algorithm the odds ratio 
for the competing models is given by the fraction of the 
time that the chain spends exploring each model. While 
trans-dimensional searches provide an elegant solution to 
the model determination problem in principle, they can 
perform very poorly in practice as the chain is often re- 
luctant to accept a trans-dimensional move. 

A simpler alternative is to compare the outputs of 
MCMC runs using models of fixed dimension. The odds 
ratio can then calculated using Bayes factors. Calculat- 
ing the marginal likelihood of a model is generally very 
difficult as it involves an integral over all of parameter 
space: 



Px(s)= / p(s\X,X)p(X,X)dX. 



(32) 



Unfortunately, this integrand is not weighted by the pos- 
terior distribution, so we cannot use the output of the 
MCMC algorithm to compute the integral. When the 
likelihood distribution has a single dominant mode, the 
integrand can be approximated using the Laplace approx- 
imation: 



p(X, X)p{s\X, X) ~ p(A ML , X) p {s\Xmu X) 

where F is given by the Hessian 

d 2 ln(p(X,X)p(s\X,X)) 



d\jd\i 



A-Aml 



(33) 



(34) 
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TABLE IV: F-MCMC search for 10 galactic binaries using a 
model with 9 sources. The frequencies are quoted relative to 
1 mHz as / = 1 mHz + Sf with Sf in /iHz. 





SNR 


A (lO" 22 ) 


Sf 


e 


<t> 




l 


<Po 


True 


8.1 


0.56 


0.623 


1.18 


4.15 


2.24 


2.31 


1.45 


MCMC ML 


8.6 


0.77 


0.619 


1.12 


4.16 


1.86 


2.06 


1.02 


True 


9.0 


0.47 


0.725 


0.80 


0.69 


0.18 


0.21 


2.90 


MCMC ML 


11.3 


0.95 


0.725 


0.67 


0.70 


0.84 


0.98 


1.30 


True 


5.1 


0.46 


0.907 


2.35 


0.86 


0.01 


2.09 


2.15 


MCMC ML 


6.2 


0.75 


0.910 


2.09 


0.61 


3.09 


1.82 


2.21 


True 


8.3 


1.05 


1.126 


1.48 


2.91 


0.46 


1.42 


1.67 


MCMC ML 


7.0 


0.81 


1.112 


1.24 


2.95 


0.45 


1.31 


2.55 


True 


8.2 


0.54 


1.732 


1.45 


0.82 


1.58 


0.79 


2.05 


MCMC ML 


7.5 


0.76 


1.730 


1.95 


0.70 


1.23 


1.19 


2.68 


True 


14.7 


1.16 


1.969 


1.92 


0.01 


1.04 


2.17 


5.70 


MCMC ML 


12.9 


1.23 


1.965 


1.97 


6.17 


0.99 


1.99 


6.11 


True 


4.9 


0.41 


2.057 


2.19 


1.12 


1.04 


2.13 


3.95 


MCMC ML 


















True 


8.8 


0.85 


2.186 


2.21 


4.65 


3.13 


2.01 


4.52 


MCMC ML 


10.0 


1.00 


2.182 


2.41 


5.05 


0.23 


2.00 


5.59 


True 


7.6 


0.58 


2.530 


2.57 


0.01 


0.06 


0.86 


0.50 


MCMC ML 


5.8 


0.72 


2.536 


0.58 


5.70 


3.04 


1.52 


4.64 


True 


11.7 


0.69 


2.632 


1.17 


3.14 


0.45 


2.53 


0.69 


MCMC ML 


13.2 


1.21 


2.631 


1.41 


2.97 


0.46 


2.02 


0.50 



When the priors p(X, X) are uniform or at least slowly 
varying at maximum likelihood, Fij is equal to the Fisher 
matrix Fy . The integral is now straightforward and 
yields 

(2tt) d / 2 

p X {s) ~ p{\ML,X)p(s\\ M L,X)—j—j^- . (35) 

With uniform priors p(Xml, X)=l/V , where V is the vol- 
ume of parameter space, and (2t:) d ^ 2 /detF—AV , where 
Ay is the volume of the error ellipsoid. 

To illustrate how the Bayes factor can be used in model 
selection, we repeated the F-MCMC search described in 
the previous section, but this time using a model with 

9 sources. The results of a typical run are presented in 
Table Hvl The parameters of the 9 brightest sources were 
all recovered to within 3cr of the input values, save for the 
sky location of the source with frequency / ~ 1.00253 
mHz. It appears that confusion with the source at / = 
1.002632 mHz may have caused the chain to favour a 
secondary mode like the one seen in Figure 0] Using 
ff 351) to estimate the marginal likelihoods for the 9 and 

10 parameter models we found lnpg(s) = —384.3 and 
lnpio(s) = —394.9, which gives an odds ratio of 1 : 4x 10 4 
in favour of the 9 parameter model. In contrast, a naive 
comparison of log likelihoods, ln£ 9 = 413.1 and ln£i = 
425.7 would have favoured the 10 parameter model. 

It is also interesting to compare the output of the 10 
source MCMC search to the maximum likelihood one gets 



by starting at the true source parameters then applying 
the super cooling procedure (in other words, cheat by 
starting in the neighborhood of the true solution). We 
found p c hcat(s) = -394.5, and ln£ cho at = 421.5, which 
tells us that the MCMC solution, while getting two of the 
source parameters wrong, provides an equally good fit to 
the data. In other words, there is no data analysis algo- 
rithm that can fully deblend the two highly overlapping 
sources. 



VIII. CONCLUSION 

Our first pass at applying the MCMC method to LISA 
data analysis has shown the method to have considerable 
promise. The next step is to push the existing algorithm 
until it breaks. Simulations of the galactic background 
suggest that bright galactic sources reach a peak density 
of one source per five 1/year frequency bins |10|. We have 
shown that our current F-MCMC algorithm can handle 
a source density of one source per ten frequency bins 
across a one hundred bin snippet. We have yet to try 
larger numbers of sources as the current version of the 
algorithm employs the full D = 7N dimensional Fisher 
matrix in many of the updates, which leads to a large 
computational overhead. We are in the process of modi- 
fying the algorithm so that sources are first grouped into 
blocks that have strong overlap. Each block is effectively 
independent of the others. This allows each block to be 
updated separately, while still taking care of any strongly 
correlated parameters that might impede mixing of the 
chain. We have already seen some evidence that high 
local source densities pose a challenge to the current al- 
gorithm. The lesson so far has been that adding new, 
specially tailored proposal distributions to the mix helps 
to keep the chain from sticking at secondary modes of 
the posterior (it takes a cocktail to solve the cocktail 
party problem). On the other hand, we have also seen 
evidence of strong multi-modality whereby the secondary 
modes have likelihoods within a few percent of the global 
maximum. In those cases the chain tends to jump back 
and forth between modes before being forced into a de- 
cision by the super-cooling process that follows the main 
MCMC run. Indeed, we may already be pushing the lim- 
its of what is possible using any data analysis method. 
For example, the 10 source search used a model with 
70 parameters to fit 400 pieces of data (2 channels x 2 
Fourier components x 100 bins). One of our goals is to 
better understand the theoretical limits of what can be 
achieved so that we know when to stop trying to improve 
the algorithm! 

It would be interesting to compare the performance of 
the different methods that have been proposed to solve 
the LISA cocktail party problem. Do iterative methods 
like gCLEAN and Slice & Dice or global maximization 
methods like Maximum Entropy have different strengths 
and weakness compared to MCMC methods, or do they 
all fail in the same way as they approach the confusion 
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limit? It may well be that methods that perform better 
with idealized, stationary, Gaussian instrument noise will 
not prove to be the best when faced with real instrumen- 
tal noise. 
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